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We present details of our investigations of the Parallel Tempering algorithm. We consider the 
application of action matching technology to the selection of parameters. We then present a simple 
model of the autocorrelations for a particular parallel tempered system. Finally we present results 
from applying the algorithm to lattice QCD with 0(a)-improved dynamical Wilson Fermions for 
twin sub-ensemble systems. 
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QO . I- INTRODUCTION 

CTn _ 

The computational cost of lattice QCD has always been enormous. During the last few years the power of super- 
computers has grown immensely but simulations with dynamical fermions are still very time consuming. 

One of the most popular algorithms for dynamical fermion simulations is the Hybrid Monte Carlo (HMC) algorithm 
JlJ] . However it has been suggested that HMC is not very efficient at decorrelating some long range observables such 
as the topological charge ||. On the other hand, results from the SESAM collaboration j| indicate that HMC 
' simulations using Wilson fermions seem to tunnel between topological sectors at an adequate rate. The results of 
SESAM indicate an autocorrelation time for the topological charge of about 50 HMC trajectories. 

With such high computational costs it is always necessary to keep an eye open for alternative algorithms. Parallel 
Tempering (PT) or Exchange Monte Carlo was proposed in jl| to assist decorrelation in spin-glass systems. A lucid 
description of PT and related algorithms such as Simulated Tempering and their applications to spin-glass and other 
systems may be found in ||[|. 

Recently PT has been applied to simulations of lattice QCD with staggered fermions [0 and this preliminary 
study indicated that the autocorrelation times for some observables were significantly improved over the normal HMC 
results. 

In this paper we present our study of the PT algorithm using 2 flavours of degenerate 0(a)-improved Wilson 
fermions |g] with a non-perturbatively determined coefficient 

PT simulates several lattice QCD ensembles concurrently, hereafter referred to as sub-ensembles, with different 
simulation parameters. PT exploits the fact that the equilibrium distributions of the configurations in individual 
sub-ensembles have an overlap, and occasionally tries to swap configurations between pairs of sub-ensembles, while 
keeping all sub-ensembles in equilibrium. This is done in such a way that the factorisation of the joint equilibrium 
distribution of configurations into the individual distributions for each sub-ensemble is not disturbed by the swapping. 

The acceptance of these swap attempts depends on how close the sub-ensembles are to each other in parameter 
space. The concept of distance in parameter space is formalised in |Io|-^2]| by the machinery of action and observable 
matching. In theory, this technology should allow the selection of an optimal set of parameters to maximise the swap 
acceptance rate between the sub-ensembles. 

Another possibility is to use the action matching technology to define curves in parameter space on which some 
observable such as ro |]l3f is constant. PT can, in principle be used to simulate numerous points on such a curve in 
one simulation. However it must be stressed that this scenario is different from the one above. Matching observables 
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is not the same as matching the action |12j] . Hence in this case one does not in general have as good control over the 
swap acceptance rate as in the situation outlined previously. 
The remainder of this paper is organised as follows. 

Our particular variant of the PT algorithm is described in detail in the following section, where we show that it 
satisfies detailed balance and present a formula for the acceptance rate of the swap attempts. We then relate this 
formula to the distance in parameter space as defined in the context of action matching technology. 

The swapping of configurations between sub-ensembles is expected to reduce the autocorrelation times of observables 
within individual sub-ensembles with respect to their HMC autocorrelation times. In section III we discuss the simple 
case of a PT system consisting of two sub-ensembles. We suggest a model for the autocorrelation function in the PT 
sub-ensembles in terms of that of the original HMC ensembles. 

Our simulations are discussed in Section IV and our results are presented in Section [y| We show that indeed our 
acceptance rate formula of section |n| is borne out by the simulation results. We estimate the autocorrelation time of 
the plaqu ette for several swap acceptance rates and compare these estimates with the prediction of the model outlined 
in section 
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Our summary and conclusions are presented in Section VI. 



II. THE PARALLEL TEMPERING ALGORITHM 



Notation 

Let each sub-ensemble be labelled by an index i and let the phase space of sub-ensemble i be T^. Each sub-ensemble 
has an action Si which depends upon the set of parameters and the fields of the sub-ensemble. 
We simulate two flavours of dynamical fermions using the standard pscudofcrmionic action: 

S t = -^Wn(U) + ^ (M^Kuc^MiK^a))' 1 <j> (1) 

where Wo is the Wilson plaquette action, U are the gauge fields, cf> are the pseudofermion fields, and M is the 0(a)- 
improved fermion matrix with hopping parameter k and clover coefficient c. In addition, for HMC algorithms we need 
to introduce momentum fields iTi and construct Hamiltonian functions TLi = nf + Si. A state in sub-ensemble i is 
then represented by the triple a.; = (Ui, 7^, </>j) while the parameter set for sub-ensemble i is the triple of real numbers 
(f3i, Ki,Ci). Note that the subscript i serves only to distinguish ensembles and will be dropped when discussing a single 
sub-ensemble. 

Each sub-ensemble has the phase space 

Ti = {Ui} ® {7Ti} ® {&}. (2) 

We note at this stage that all the Ti are identical and the only distinguishing features of individual ensembles are 
their parameter sets and quantities which depend upon these such as Si or Hi- 

A PT simulation state is thus the collection of states {a,|« = l...n}, where n is the number of sub-ensembles. The 
overall PT phase space is the direct product of the phase spaces of the sub-ensembles 



PT 
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Detailed Balance 

In a PT simulation one needs to construct a Markov process which has (joint) equilibrium probability distribution: 

P eq = TJ^ q([/)7r0) (4) 

i 

where P eq (£7, 7r, 0) is the desired equilibrium probability distribution of the individual sub-ensemble i. In our case 

Pr(U,n,<t>) = ±e-^ u ^ (5) 

Zi = J [dU] [ck] [d0] [dtf] e -««(^). (6) 
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Equation |j formalises our notion of simulating ensembles independently. To be more precise, the Markov steps 
within any individual sub-ensemble are independent of those in the others, but the resulting sub-ensembles are not 
independent as they are coupled by the swapping steps. However the overall joint equilibrium distribution of the PT 
system is not affected by the swapping, and remains the product of the individual equilibrium distributions of the 
sub-ensembles. 

We define two kinds of Markov transitions: 

f . Transitions within a single ensemble: These transitions can be made with any desired Markovian update 
procedure that satisfies detailed balance with respect to P eq for its sub-ensemble. In our case such transitions 
are made with HMC. We refer to the set of HMC trajectories that are performed between swaps as an HMC 
step. 

2. Transitions between sub— ensembles: These transitions are used to connect the phase spaces of the sub- 
ensembles. Such a transition would be a proposed swap between any two sub-ensembles i and j. Let a be 
a configuration in sub-ensemble i and b be a configuration in sub-ensemble j. The swap transition can be 
denoted: 

, J (b, a) if swap is accepted 

^ ' ' 1 (a, b) if swap is rejected. ^ ' 

Let us denote by P s (i,j) the probability that the swap succeeds. The detailed balance condition is: 

P s (i,j)e- H ^e- n ^ = P s (j,i)e- u ^e- n ^ (8) 

as the contributions from the other ensembles cancel on both sides. A suitable choice for P s is the simple 
Metropolis |l4| acceptance probability: 

P s (i,j) = min(l,e- AW ) (9) 

where 

AH = {H 3 {a) + H^b)} - {H t {a) + H 3 {b)} (10) 

which satisfies the detailed balance condition by construction. 

The required overall Markov transition should be constructed of a number of both kinds of transitions. HMC steps 
within all the sub-ensembles are necessary and sufficient for convergence. Transitions between sub-ensembles are not 
essential but without them PT would basically be the same as running several independent HMC simulations. 

Swap Acceptance Rate 

Any extra decorrelation of observables in PT over and above normal HMC must necessarily come from the swapping 
transitions. Control of the acceptance rate for swapping transitions is therefore important. The swapping probability 
is determined by the energy change AH as in (^). The acceptance rate for Metropolis- like algorithms of this kind is 
easily shown to be [|l5| 

(AHerfcQv^AW))- (11) 

Here (AH) is the average of AH over all swap attempts, and {A) is the average acceptance rate of the swap attempts. 
Action Matching 

The action matching formalism outlined in JTo[ ] formalises the meaning of distance in parameter space. We review 
here the salient points of the discussion. 

Let Si[U] and Si[U] be the actions of two lattice gauge theories with the same gauge configuration space, so that 
the partition function of each is: 



Zi = J [dU]exp{-Si[U]} (12) 
and the expectation of an observable O in ensemble i is: 
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(0)t = ^ J [dU}G(U)exp{-Si[U]}. (13) 

Naturally, using actions dependent on other fields will complicate the integration measure. If one deals with pseud- 
ofcrmions for example, these would have to be integrated also, both in the partition function Z and in the expectation 
value of observables. We will not explicitly write out integrations over pseudofcrmions in expectation values, except 
cases where such an ommission may lead to ambiguities. 

The expectation of O in the other ensemble is given to first order in a cumulant expansion by: 

(0> 2 = (0>i + (0Ai2>i + ... (14) 

where A 12 = Si - S 2 and 6 = O - (0) etc. 

The distance between the two actions is defined as the variance 

d = a 2 (A 12 ) = <A? a > (15) 

where the expectation is to be evaluated in either sub-ensemble. 
Three matching conditions have been identified: 

• Match the values of observables i.e. require that (0)i = (0) 2 

• Minimise d 

• Maximise the acceptance in an exact algorithm for 52 constructed via accept/reject applied to configurations 
generated with action S\. 

It was shown in ||lo|| that the last two conditions are equivalent to lowest order in a cumulant expansion. Under 
special circumstances the first condition is also equivalent to the other two to lowest order. The prescriptions differ 
in a calculable way at the next order. 

We are now ready to make the connection between PT and the formalism of action matching. We note that the 
energy difference before and after a PT swap attempt is 

8 = AH. (16) 

The momentum fields cancel exactly in the Hamiltonian terms and one can deal directly with the actions: 

5 = Si(U 2 ,4>2) + S 2 (Ui,4>i) - SxiUufa) - S 2 {U 2 ,4>2) (17) 

Collecting the terms depending on the same fields one obtains: 

6 = Ai 2 (U 2 ,<j> 2 )-Ai 2 (Ui,<j>i). (18) 

We now identify 6 with —6 in (3.15) in |l(J. Following the analysis of [^0) one may obtain the acceptance rate formula 
of the action matching mechanism 



(A)=erfc[-y/o*(Kuj) . (19) 



a 2 (A 12 ) = {AH) w \a 2 {AH) (20) 



One can then deduce that 



where the second approximate equality is required to derive the acceptance rate ([11]). 

Our PT parameters were tuned using the action matching technology to maximise the acceptance between two 
subensembles using the action 

Si = -foWn - Ti (21) 

with 

Ti = Tr HQ; 1 ) (22) 
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and 

Q, = (M^K^Mim))' 1 . (23) 

The tuning was carried out before performing the PT simulation using configurations from a preliminary HMC run 
at the desired reference parameter set. 

However our PT simulations were carried out using the action 

S, = -f3 l Wa+^Q t cb. (24) 



We have found that tuning parameters using ( |2l| ) for which we had reliable technology did not optimise the swap 
acceptance of our simulations. The reasons for this are discussed below. 

Consider first the distance a 2 between actions Si where the Si are as given by Then 

A12 = A(3Wn + AT (25) 

with 

A/3 = /3 2 - p 1 (26) 

AT = T 2 — Ty. (27) 

The variance of A12 in an individual subensemblc is 

a 2 (A 12 ) J = ((A/3n^ + Af) 2 }. i . (28) 

One can see that for a given AT one can tune A/3 to minimise this variance. 

However when one examines the case of the pseudofermionic action of (|24| ) one finds that 

A 12 = A(3W n + ^ (Qi - Q a ) </>. (29) 

When calculating the variance of A12 one encounters the quadratic term 

(0 t (Qi-g 2 )00 t (Qi-g 2 )0) l . (30) 

This term gives rise to both connected and disconnected pieces when the integration over the pseudofcrmion fields is 
carrried out 

(<t> j (Qi - g 2 )^ t (Qi - QMh = (Tr 2 ((g! - g 2 )g.r 1 ))f + <t>(Qi - Q2)Qt\Qi - Q2)Q7 1 )f. (3i) 

Here the superscript U on the expectations indicates that they are now to be carried out over the gauge fields only. 
Hence one finds that 

<r?(A 12 ) = {{A0Wn + T>((Qi ^WT 1 )?)? + (MQi - Q^QtHQi ~ Qi)QT^ ( 32 ) 
We also note that to first order in Q\ — Q2 

AT « Tt((Qi - Q 2 )Qi 1 ) (33) 

Comparing equations ( p8| ) and ( |32| ) it can be seen that using a pseudofermionic action gives rise to a connected piece 
in cr 2 (A 12 ) which one would not get using the action of This connected piece cannot be tuned away by changing 
A/3 and it increases the distances in parameter space compared to when the action of ( plf ) is used. If parameters 
are tuned using the action of ( ]2l] ) and the simulation is carried out using pseudofermions the acceptance rate of the 
swaps will not be optimised. 

With hindsight it may be said that using pseudofermions was not the best choice for performing our simulations, 
and that the action of (21) should have been evaluated on our swap attempts to calculate Ai 2 instead of using the 
pseudofermionic action to calculate AH. 
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III. AUTOCORRELATIONS 



The cost of measuring observables 

The gain from PT is expected to come from the swapping of configurations between sub-ensembles. This reduction 
in autocorrelation time is supposed to occur due to the fact that the sub-ensembles are simulated (between swaps) 
with independent Markov processes. However the swaps couple the ensembles and include cross correlations between 
them. Thus care must be taken when using results from separate subensembles together. 

According tojl^jl^] if successive measurements of O are correlated, the sample mean O is given (we use the 



convention of |16|) by the formula: 



O=(O)±^^+1^(0). (34) 

Here, a 2 (O) is the variance of operator O given by 

a 2 [O) = (O 2 ) - (O) 2 (35) 
and to is the integrated autocorrelation time, defined as 

oo 

ro = ^Co(<) (36) 
t=i 

and where Co(t) is the normalised autocorrelation function: 

° (0) 

and the expectation values are over all pairs of Oi separated by an interval t. From now on we shall drop the subscript 
O from these formulae except where necessary. Furthermore the term 'autocorrelation time' will always be used to 
refer to the integrated autocorrelation time. 

The practical meaning of the statements above is that 2t + 1 correlated measurements of O are needed in order 
to reduce the error in O by the same amount as if two uncorrelated measurements were used. Markov methods in 
general produce correlated sequences of configurations, and hence correlated sequences of measured observables. The 
integrated autocorrelation time t is therefore an important indicator of the performance of a Monte Carlo simulation 
that is carried out with the intention of measuring observable O. 

In particular, if one assumes that the autocorrelation function decays exponentially 

C{t) = exp{-fci} (38) 

with k > 0, one finds that 

exp{-fc} = -S- (39) 

T + 1 

which is a result we shall use later. 
Autocorrelations in twin sub ensemble PT 

We are interested in whether or not PT will reduce the integrated autocorrelation time of an observable measured on an 
ensemble with some parameter set relative to the corresponding autocorrelation time of the same observable measured 
on an ensemble generated at the same parameters using HMC. We refer to the former of these autocorrelation times 
as the PT autocorrelation time and the latter as the HMC autocorrelation time. 

Let us examine the situation of a PT system with two sub-ensembles. Sub-ensemble 1 has the desired parameter 
set, and the other sub-ensemble has its parameters chosen so as to give some acceptance rate (A). We assume that the 
HMC autocorrelation functions of both ensembles are the same. We demonstrate in section [V] that over the distances 
in parameter space for which we can use PT, and with the statistics available, we cannot differentiate between 
the autocorrelation times of the plaquette operator between sub-ensembles, so we regard the above assumption as 
reasonable. 

Having made the above assumption, the changes in the autocorrelation time due to PT are now controlled solely 
by the number of successful swaps between the sub-ensembles. The swap probability in general depends on the 
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particular PT state at which the swap is attempted, but for simplicity we assume that we can replace individual swap 
probabilities with the overall average swap probability which is none other than the acceptance rate (A). 

Let the HMC autocorrelation function be denoted Cjj (t) , and the PT autocorrelation function of the sub-ensemble 
of interest be denoted Cpr(t). Consider the connected autocorrelation function: 

n—t 

C H (t) = ^r^Oi+A (40) 

where n is the number of samples of Oi . 

The autocorrelation function in the PT ensemble of interest can now be written as: 

Cprit) = {S e + S } (41) 

where 

S e = ]T O l+t O t (42) 

even 

S = Y,£>i+t£>i- (43) 

odd 

By the even sum we mean that the only terms contributing to the sum are those where an even number of swaps 
succeeded out of the t tried between the measurements of Oi+t and 0;. 

Given some configuration in one sub-ensemble, after an odd number of successful swaps it can only be in the other 
one. As the HMC steps are independent in different sub-ensembles, we expect (to a first approximation) no correlation 
between configurations in a sub-ensemble that are separated by an odd number of swaps. Hence we assume that S 
sums to zero and we consider only the .S^ term. 

We then rewrite ( |4l| ) as: 

C PT = PeC H {t) (44) 
where P e is the probability that an even number of successful swaps occur in t trials. P e is given by 

P e =Y J Cl{\-(A))^(AY (45) 

i 

where the index i runs from to the largest even integer less than or equal to t, i is even and C\ is the number of 
ways of choosing i swaps from t. 

Carrying out the sum in equation (W3) one finds 



P e = 
2 



l -{l + {l-2(A)) t ) (46) 
leading to the result: 

C PT {t)= 1 -{l + {l-2{A)) t }c H {t). (47) 

We consider three separate cases. 

i) (A) =0: In this case Cpx(t) = Cn(t), which is what we expect when we do not carry out any successful 
swaps. 

ii) < (A) < ^: In this case Cpx £ [\ChtCh) and we can see a reduction in the autocorrelation function of 
at most a factor of 2. 

iii) \ < (A) < 1: In this case the term (1 — 2(A)) 1 in equation ( |47| ) becomes oscillatory. In particular if (A) = 1 
(every swap succeeds) it is impossible to get an even number of successful swaps out of an odd number of trials, 
whereas it is a certainty for an even number of trials. 
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If one models the autocorrelation function by an exponential decay as in (pq), it is possible to calculate the PT 
integrated autocorrelation time for the ensemble: 

oo 

T PT = Y, C PT( t ) (48) 

1 

1 1 °° 

= 2™ + 2 ^ ((1 " 2(A)) exp ^ fc » 4 ( 4Q ) 
l 

_ t h [1 + (r g - 1)] 

- 1 + 2(A)th (50) 

where the last line follows from using (l39|), summing the resulting geometric series and simplifying. The ratio of tpt 
to th is then: 

tpt 1 + (A)(t h -1) 

t h 1 + 2(A)t h ' 1 1 



We remark on several features of the ratio in (51) 



i) When (A) = 0, one is, in effect, carrying out two uncoupled HMC simulations and the autocorrelation times in 
each sub-ensemble remain the same as they would be for HMC simulations. 

ii) For a fixed (A) £ (0, |) increasing th from has the effect that the ratio of ( |5l| ) approaches the value of i from 
above. The closer (A) is to |, the faster this limit is approached. If one is interested in both subensembles this 
is still a gain. If one of the two ensembles serves only to decorrelate the other and is not otherwise interesting (it 
is thrown away at the end) then one would lose over HMC as one would have done twice the work, but gained 
less than a factor of two. 

iii) For (A) = ~ the ratio is exactly \ and a breakeven is reached, in the sense that one does the work of two 
simulations, but in each subensemble the integrated autocorrelation is halved. This is the stage when a sub- 
ensemble which originally served no other purpose than to help decorrelate the other one may be thrown away 
without losing out. 

iv) For (A) £ (|, 1] the ratio approaches \ rapidly from below. In this case one clearly wins even if one is only 
interested in a single sub-ensemble. However the gain is not much, as for any reasonable value of th the ratio 
will have already approached the asymptotic limit of \ to a good level of accuracy. 

One can therefore win most with PT when the acceptance rate is very high, and the observable of interest has a 
very short autocorrelation time. In such a situation it is possible to gain more than a factor of two over the HMC 
autocorrelation time in each ensemble if the swap acceptance rate is greater than i. However if an observable has 
such a short HMC autocorrelation time, it may not be worthwhile employing PT. Parallel tempering was supposed to 
be used to decorrelate observables with long autocorrelation times. In a typical situation, it would be expected that 
the gain in each ensemble is very close to a factor of 2. 



IV. SIMULATION DETAILS 



Our PT simulations were carried out on the Cray T3E in Edinburgh. Code for performing the HMC trajectories 
was taken from the GHMC code written for the UKQCD Dynamical Fermions project, described in [fl8|| . 

Program Features 

The PT code ran trajectories on each sub ensemble in series. Sub- ensembles were labelled from to N — 1, where iV 
was the total number of sub-ensembles. Swaps of configurations between sub-ensembles were attempted according to 
a boolean plan matrix M. If, after carrying out the HMC trajectories in sub-ensemble i, the element M t j was found 
to contain true, the code would attempt to swap configurations j and j + 1. (j £ [0, N — 2]) The default matrix had 
all its elements set to false except for the last row which had all its elements set to true. This way the program would 
perform all the HMC trajectories on all the ensembles and would then attempt a chain of pairwise swaps. 

The number of HMC trajectories per sub-ensemble was controlled through an independent parameter file for each 
sub-ensemble. This way a sub-ensemble could be equilibrated with the GHMC code and if desired, they could easily 
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be taken and further simulated separately using the GHMC code. Likewise each sub-ensemble kept a separate set of 
log files for the plaquette and for solver statistics. The overall driver routine kept a log file of the success or failure of 
swap attempts and the swap energies. 

Simulation Parameters 

Five PT simulations 51, 52, S3, 54 and 55 were performed, each of which comprised two sub-ensembles. The 
parameters for these simulations are shown in table |. In all five simulations one sub-ensemble had parameters 
(/3 = 5.2, c = 2.0171, k = .13300). The parameters for the second sub-ensemble were given by action matching for 
SI, S2 and S3, while for 54 and 55 only k was varied. Thus we could investigate the PT swap acceptance rate for 
different distances in parameter space. 

We also had data from a previous HMC simulation with parameters ((3 = 5.2, c = 2.0171, k = .13300) on lattices of 
volume 8 3 x 16 and 8 3 x 24. 

The results from the reference run on the 8 3 x 16 lattice were used to validate the PT code. Our PT simulations 
were also carried out on lattices of this size. Furthermore, it was possible to compare the autocorrelation times of 
the plaquette from this HMC run with the autocorrelation times of the plaquette from the first sub-ensembles of 
the PT runs. For the second sub-ensembles, the GHMC code was used only to achieve equilibration. Thus there is 
insufficient data to calculate the HMC autocorrelation times of the second sub-ensembles. 

In the PT simulations each HMC step was one trajectory long. The plan matrix used was the default one described 
earlier. Simulations 51, 52 and 53 ran for 6000 swap attempts giving 6000 trajectories for each sub-ensemble, while 
54 and 55 ran for only 1000 swap attempts due to time constraints. 

The matching procedure was performed using with the reference HMC results from 8 3 x 24 lattices, using the 
methods outlined in p"2[ . 

Analysis 

We examined the acceptance rate as a function of the average swap energy change (AH), and of Ak = n-i — K\, the 
change in the hopping parameters. We investigated the autocorrelation time of the average plaquette. 

Errors in ensemble averages were estimated using the bootstrap method. Autocorrelations were estimated using 
the sliding window scheme of Sokal et al. J17| . 



V. RESULTS 

A summary of our results is shown in table [n| We show for each simulation A/3 = fii — the corresponding Ak, 
(AH), the acceptance rate (A), the integrated autocorrelation time r for the plaquette in sub-ensemble 1 and the 
autocorrelation time in sub-ensemble 1 divided by the HMC autocorrelation time, ti/thmc- 

Swap Acceptance Rate 

Figure |l| shows the measured swap acceptance rates of the simulations. The solid line is the acceptance rate formula 
in (|TT|). It can be seen that the measured results are in excellent agreement with its predictions. 

Calibration and Matching 

It can be seen from table [n| that simulations 52 and 53 which had parameters given by matching the Tr In actions of 
j2l| ) have lower acceptance rates than 54 and 55 for which tempering was carried out only in k. We expect that this 
is due to the noise term of (|3^) and is the result of using the pseudofermion action for calculating the swap energy 
differences. 

To see how large the effect of this noise term is, we can compare the residual variance a 1 (A12) from the matching 
procedure jl^] , using the Tr In action with the variance as measured in our PT simulations through (AH). Note that 
we only have biased estimators for <r 2 (Ai2) from the matching procecure, and that we have calculated the residual 
variance estimate only for Ak = .0005. 



Table III contains our comparison of the Tr In matching predictions and pseudofermionic measurements for sim- 
ulation 53. We can see in column 2, our biased estimate of the residual variance on matching and in column 4 the 
corresponding predicted acceptance rate. In column 3 we see the actual variance as measured in the simulation and 
in column 6 the corresponding measured acceptance rate. We expect the difference in the variances to be due to the 
four point term in equation (|3^). We can therefore numerically estimate the four point term to be 

(Tr(Q 2 - Q 1 )Qr\Q 2 - Q^Qr 1 )" = 6.6(2) (52) 

for simulation 53. 

Note that if during our swap acceptance steps, we would discard the pseudofermion fields, and calculate the energy 
change using the Tr In action by the methods outlined in |L2|, we would suffer a workload hit, but would expect 
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an accept rate of around 48% in the case of simulation S3. Thus using pseudofermions was a poor way to proceed 
originally. However as the action difference scales like the lattice volume, going to larger lattices would effectively 
cancel all the gain one could obtain by using the Tr In action to evaluate the swap action/energy difference. 

Autocorrelation Times and Efficiency 

The autocorrelation times of the plaquette operator on the sub-ensembles with parameter k — .1330 are shown in 
column 5 of table |fi|. We also show for comparison the autocorrelation time estimated from our independent HMC 



run at the same parameter set. In table IV we gather some estimates of the integrated autocorrelation time of 
the plaquette for some independent HMC runs at similar parameters to our PT runs. It can be seen that the HMC 
autocorrelation times agree with each other within estimated errors, justifying the assumptions of our model of section 

1 n 

Figure □ shows the ratio of PT to HMC autocorrelation times. The errors on the ratios were obtained by simple 
error combination. The line superimposed on the data in figure || is the prediction of the model in section [n] (c.f. 
equation |5l|) . It can be seen that it is not inconsistent with the data. 

VI. SUMMARY AND CONCLUSIONS 

In this paper we presented our study of the Parallel Tempering algorithm applied to lattice QCD with 0(a)- 
improved Wilson fermions. We showed how the algorithm satisfies detailed balance, and gave a formula for the swap 
acceptance rate in terms of the swap energy change ATI. We highlighted the connection of parallel tempering with the 
technology of action matching. We presented and discussed a simple model of autocorrelations in a twin sub-ensemble 
PT system, and found that the algorithm is unlikely to improve autocorrelation times by more than a factor of two 
for such a system. We verified our simple model assumptions by gathering autocorrelation time data from previous 
simulations. 

We carried out a numerical study where we verified the acceptance formula and the predictions of the autocorrelation 
model within statistical errors. We also obtained information on how the acceptance rate of the algorithm falls with 
increasing An. 

We found that using the pseudofermions from HMC on the swap attempt is a poor way to proceed if the parameters 
are matched for the Tr In action. We have shown analytically that there is an extra noise term in the definition of the 
distance between actions when pseudofermions are used. We have attempted to estimate the size of this noise term 
numerically 

We conclude that Parallel Tempering does not seem to give any real gain over HMC at the present time. We were 
unable to use PT to simulate sub ensembles sufficiently far apart in parameter space. The acceptance rate drops 
too quickly with Ak. This situation could be alleviated somewhat if the swap action/energy differences were to be 
calculated using the Tr In action, for simulations with parameters matched with that action. However in the end the 
real problem is that the swap action/energy change scales with the volume for a fixed kappa, and that when employing 
the PT algorithm on a realistic sized (eg 16 3 x 32) lattice, the scaling of the swap energy change would lower the 
acceptance rate and lose all that could be gained by using the Tr In action. 

Thus we could not take advantage of the fact that in one region of parameter space autocorrelation times are short 
while in another they are long. With our parameter values, the HMC autocorrelation times of our sub-ensembles are 
the same within experimental errors and the predictions of our model apply. A chain of sub-ensembles that would 
span the required distance in parameter space can be constructed, but would take an unfeasibly large number of 
sub-ensembles for lattices of interesting size. 
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TABLE I. Simulation parameters used for twin ensemble runs and the reference HMC run 



[h] Simulation 


((3l, Cl, Kl) 


(/?2, C2, K2) 


HMC 


(5.2,2.0171,0.133) 




SI 


(5.2,2.0171,0.133) 


(5.2060,2.01002,0.13280) 


S2 


(5.2,2.0171,0.133) 


(5.2105,2.00471,0.13265) 


S3 


(5.2,2.0171,0.133) 


(5.2150, 1.99940,0.13250) 


S4 


(5.2,2.0171,0.133) 


(5.2,2.0171,0.13280) 


S5 


(5.2,2.0171,0.133) 


(5.2,2.0171,0.13265) 



TABLE II. Results from the PT simulations showing the appropriate results from HMC for comparison 



Simulation 


A/3(xl0~ 3 ) 


Ak(x10~ 4 ) 


(AH) 


(^4) 


Tl 


ti/thmc 


HMC 










26(6) 


1 


SI 


6 


-2.0 


1.23(2) 


0.43(1) 


12(3) 


0.5(2) 


S2 


10.5 


-3.5 


3.76(4) 


0.17(1) 


19(4) 


0.7(2) 


S3 


15 


-7.5 


7.64(6) 


0.051(2) 


24(6) 


0.9(3) 


S4 





-2.0 


0.91(4) 


0.49(1) 


9(4) 


0.3(2) 


S5 





-3.5 


2.29(7) 


0.26(2) 


18(10) 


0.7(4) 



TABLE III. Comparison of Tr In matching and acceptance with pseudofermionic acceptance 

Simulation o- 2 (Ai 2 )Tr in o- 2 (A) p . f = (AH) (A) Tl i n {A) p . ( 

S3 1.02(20) 7.64(6) 0.48(5) 0.051(2) 



TABLE IV. The integrated autocorrelation times of some other simulations. 



p 


c 


K 


THMC 


5.2 


1.99 


.1335 


18(8) 


5.2 


2.0171 


.1330 


26(6) 


5.232 


1.98 


.1335 


20(6) 
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Swap Acceptance Rate v.s. < AH > 
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FIG. 1. Acceptance rate against {ATI). Error bars are smaller than the symbols 
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Comparison of Autocorrelation Times 
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FIG. 2. Integrated autocorrelation times for the plaquette normalised by that from GHMC simulations 
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